AITopics | weight transfer

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

Neural Information Processing SystemsMar-17-2026, 01:04:35 GMT

The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained on the source domain is used as an initialization point for a network to be trained on the target domain. In deep metric learning, the source domain is used to construct an embedding that captures class structure in both the source and target domains. In few-shot learning, the focus is on generalizing well in the target domain based on a limited number of labeled examples. We compare state-of-the-art methods from these three paradigms and also explore hybrid adapted-embedding methods that use limited target-domain data to fine tune embeddings constructed from source-domain data. We conduct a systematic comparison of methods in a variety of domains, varying the number of labeled instances available in the target domain (k), as well as the number of target-domain classes. We reach three principal conclusions: (1) Deep embeddings are far superior, compared to weight transfer, as a starting point for inter-domain transfer or model re-use (2) Our hybrid methods robustly outperform every few-shot learning and every deep metric learning method previously proposed, with a mean error reduction of 34% over state-of-the-art.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Genre: Research Report (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.86)

Add feedback

5a9542c773018268fc6271f7afeea969-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 20:28:32 GMT

manuscript, new class, pascal voc 2012, (13 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

Neural Information Processing SystemsNov-20-2025, 23:01:44 GMT

The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained on the source domain is used as an initialization point for a network to be trained on the target domain. In deep metric learning, the source domain is used to construct an embedding that captures class structure in both the source and target domains. In few-shot learning, the focus is on generalizing well in the target domain based on a limited number of labeled examples. We compare state-of-the-art methods from these three paradigms and also explore hybrid adapted-embedding methods that use limited target-domain data to fine tune embeddings constructed from source-domain data. We conduct a systematic comparison of methods in a variety of domains, varying the number of labeled instances available in the target domain (k), as well as the number of target-domain classes. We reach three principal conclusions: (1) Deep embeddings are far superior, compared to weight transfer, as a starting point for inter-domain transfer or model re-use (2) Our hybrid methods robustly outperform every few-shot learning and every deep metric learning method previously proposed, with a mean error reduction of 34% over state-of-the-art.

deep embedding, k-shot inductive transfer learning, target domain, (10 more...)

Neural Information Processing Systems

Genre: Research Report (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

Tyler Scott, Karl Ridgeway, Michael C. Mozer

Neural Information Processing SystemsNov-20-2025, 20:13:02 GMT

Lempitsky, 2016) is most robust.

artificial intelligence, machine learning, target domain, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Colorado > Boulder County > Boulder (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Italy (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Vid2World: Crafting Video Diffusion Models to Interactive World Models

Huang, Siqiao, Wu, Jialong, Zhou, Qixing, Miao, Shangchen, Long, Mingsheng

arXiv.org Artificial IntelligenceSep-30-2025

World models, which predict future transitions from past observation and action sequences, have shown great promise for improving data efficiency in sequential decision-making. However, existing world models often require extensive domain-specific training and still produce low-fidelity, coarse predictions, limiting their usefulness in complex environments. In contrast, video diffusion models trained on large-scale internet data have demonstrated impressive capabilities in generating high-quality videos that capture diverse real-world dynamics. In this work, we present Vid2World, a general approach for leveraging and transferring pre-trained video diffusion models into interactive world models. To bridge the gap, Vid2World systematically explores video diffusion causalization, reshaping both the architecture and training objective of pre-trained models to enable autoregressive generation. Additionally, it incorporates a causal action guidance mechanism to enhance action controllability in the resulting interactive world models. Extensive experiments across multiple domains, including robot manipulation, 3D game simulation, and open-world navigation, demonstrate that our method offers a scalable and effective pathway for repurposing highly capable video diffusion models into interactive world models.

diffusion model, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.14357

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

Add feedback

Supplementary Materials for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Neural Information Processing SystemsAug-14-2025, 16:33:35 GMT

One of our main target issues is the semantic drift between true background and future class, and we alleviate the issue using the unknown class modeling with saliency maps.

manuscript, new class, pascal voc 2012, (13 more...)

Neural Information Processing Systems

Country: Asia > South Korea > Seoul > Seoul (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

Neural Information Processing SystemsOct-8-2024, 19:30:40 GMT

The focus in machine learning has branched beyond training classifiers on a single task to investigating how previously acquired knowledge in a source domain can be leveraged to facilitate learning in a related target domain, known as inductive transfer learning. Three active lines of research have independently explored transfer learning using neural networks. In weight transfer, a model trained on the source domain is used as an initialization point for a network to be trained on the target domain. In deep metric learning, the source domain is used to construct an embedding that captures class structure in both the source and target domains. In few-shot learning, the focus is on generalizing well in the target domain based on a limited number of labeled examples.

k-shot inductive transfer learning, target domain, weight transfer, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Add feedback

Reviews: Overcoming Catastrophic Forgetting by Incremental Moment Matching

Neural Information Processing SystemsOct-8-2024, 13:27:54 GMT

Not including an objective evaluation of limitations is a flaw of this otherwise well written paper, especially when the method relies crucially on weight transfer (as the authors point out outside the main paper, i.e. supplementary text and rebuttal). However, weight transfer is known to be an inadequate initialization technique between different problem classes and the authors don't clearly address this issue, nor do they properly qualify the applicability of the method. In balance, this paper does give sufficient evidence that weight transfer and some form of parameter averaging are promising directions of future investigation, at least in a subset of interesting cases. The method is thoroughly benchmarked, in several incarnations, against state-of-the-art baselines on standard'toy' problems defined on top of MNIST, as well as more challenging ImagNet2CUB and the Lifelog dataset. A new parameterization, dubbed'drop-transfer' is proposed as an alternative to standard weight initialization of model parameters on new tasks.

matching, overcoming catastrophic forgetting, weight transfer, (5 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

A Mean Field Ansatz for Zero-Shot Weight Transfer

Chen, Xingyuan, Kuang, Wenwei, Deng, Lei, Han, Wei, Bai, Bo, Reis, Goncalo dos

arXiv.org Artificial IntelligenceAug-16-2024

The pre-training cost of large language models (LLMs) is prohibitive. One cutting-edge approach to reduce the cost is zero-shot weight transfer, also known as model growth for some cases, which magically transfers the weights trained in a small model to a large model. However, there are still some theoretical mysteries behind the weight transfer. In this paper, inspired by prior applications of mean field theory to neural network dynamics, we introduce a mean field ansatz to provide a theoretical explanation for weight transfer. Specifically, we propose the row-column (RC) ansatz under the mean field point of view, which describes the measure structure of the weights in the neural network (NN) and admits a close measure dynamic. Thus, the weights of different sizes NN admit a common distribution under proper assumptions, and weight transfer methods can be viewed as sampling methods. We empirically validate the RC ansatz by exploring simple MLP examples and LLMs such as GPT-3 and Llama-3.1. We show the mean-field point of view is adequate under suitable assumptions which can provide theoretical support for zero-shot weight transfer.

measure structure, rc ansatz, weight transfer, (15 more...)

arXiv.org Artificial Intelligence

2408.08681

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

Park, Yeachan, Kim, Minseok, Kim, Yeoneung

arXiv.org Artificial IntelligenceMay-26-2024

We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative binary operations. To further accelerate, we elucidate arithmetic operations through the lens of the Kolmogorov-Arnold (KA) representation theorem, revealing its correspondence to the transformer architecture: embedding, decoder block, and classifier. Observing the shared structure between KA representations associated with binary operations, we suggest various transfer learning mechanisms that expedite grokking. This interpretation is substantiated through a series of rigorous experiments. In addition, our approach is successful in learning two nonstandard arithmetic tasks: composition of operations and a system of equations. Furthermore, we reveal that the model is capable of learning arithmetic operations using a limited number of tokens under embedding transfer, which is supported by a set of experiments as well.

arithmetic operation, opération, representation, (15 more...)

arXiv.org Artificial Intelligence

2405.16658

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.40)

Industry: Education > Curriculum > Subject-Specific Education (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Filters

Collaborating Authors

weight transfer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

5a9542c773018268fc6271f7afeea969-Supplemental.pdf

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

Vid2World: Crafting Video Diffusion Models to Interactive World Models

Supplementary Materials for SSUL: Semantic Segmentation with Unknown Label for Exemplar-based Class-Incremental Learning

Adapted Deep Embeddings: A Synthesis of Methods for k-Shot Inductive Transfer Learning

Reviews: Overcoming Catastrophic Forgetting by Incremental Moment Matching

A Mean Field Ansatz for Zero-Shot Weight Transfer

Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation